1 Context

If you like to eat cereal, do yourself a favor and avoid this dataset at all costs. After seeing these data it will never be the same for me to eat Fruity Pebbles again. Content Fields in the dataset: • Name: Name of cereal • mfr: Manufacturer of cereal ○ A = American Home Food Products; ○ G = General Mills ○ K = Kelloggs ○ N = Nabisco ○ P = Post ○ Q = Quaker Oats ○ R = Ralston Purina • type: ○ cold ○ hot • calories: calories per serving • protein: grams of protein • fat: grams of fat • sodium: milligrams of sodium • fiber: grams of dietary fiber • carbo: grams of complex carbohydrates • sugars: grams of sugars • potass: milligrams of potassium • vitamins: vitamins and minerals - 0, 25, or 100, indicating the typical percentage of FDA recommended • shelf: display shelf (1, 2, or 3, counting from the floor) • weight: weight in ounces of one serving • cups: number of cups in one serving • rating: a rating of the cereals (Possibly from Consumer Reports?)

2 Importing and Cleaning Data

2.1 Import Data

We will use the tidyverse library for importing and wrangling the data.

2.2 Data Cleaning and Wrangling

##  [1] "Nabisco"                     "Quaker Oats"                
##  [3] "Kellogs"                     "Kellogs"                    
##  [5] "Ralston Purina"              "General Mills"              
##  [7] "Kellogs"                     "General Mills"              
##  [9] "Ralston Purina"              "Post"                       
## [11] "Quaker Oats"                 "General Mills"              
## [13] "General Mills"               "General Mills"              
## [15] "General Mills"               "Ralston Purina"             
## [17] "Kellogs"                     "Kellogs"                    
## [19] "General Mills"               "Kellogs"                    
## [21] "Nabisco"                     "Kellogs"                    
## [23] "General Mills"               "Ralston Purina"             
## [25] "Kellogs"                     "Kellogs"                    
## [27] "Kellogs"                     "Post"                       
## [29] "Kellogs"                     "Post"                       
## [31] "Post"                        "General Mills"              
## [33] "Post"                        "Post"                       
## [35] "Post"                        "Quaker Oats"                
## [37] "General Mills"               "Post"                       
## [39] "Kellogs"                     "Kellogs"                    
## [41] "General Mills"               "Quaker Oats"                
## [43] "General Mills"               "American Home Food Products"
## [45] "Ralston Purina"              "Ralston Purina"             
## [47] "Kellogs"                     "General Mills"              
## [49] "Kellogs"                     "Kellogs"                    
## [51] "Kellogs"                     "General Mills"              
## [53] "Post"                        "Kellogs"                    
## [55] "Quaker Oats"                 "Quaker Oats"                
## [57] "Quaker Oats"                 "Quaker Oats"                
## [59] "Kellogs"                     "General Mills"              
## [61] "Kellogs"                     "Ralston Purina"             
## [63] "Kellogs"                     "Nabisco"                    
## [65] "Nabisco"                     "Nabisco"                    
## [67] "Kellogs"                     "Kellogs"                    
## [69] "Nabisco"                     "General Mills"              
## [71] "General Mills"               "General Mills"              
## [73] "General Mills"               "General Mills"              
## [75] "Ralston Purina"              "General Mills"              
## [77] "General Mills"
##              Name      Manufacturer              Type          Calories 
##       "character"          "factor"          "factor"         "numeric" 
##           Protein               Fat            Sodium             Fibre 
##         "numeric"         "numeric"         "numeric"         "numeric" 
##     Carbohydrates             Sugar         Potassium          Vitamins 
##         "numeric"         "numeric"         "numeric"         "numeric" 
##             Shelf            Weight              Cups            Rating 
##          "factor"         "numeric"         "numeric"         "numeric" 
## Manufacturer_Name 
##       "character"
##      Name           Manufacturer   Type       Calories        Protein     
##  Length:77          A: 1         Cold:74   Min.   : 50.0   Min.   :1.000  
##  Class :character   G:22         Hot : 3   1st Qu.:100.0   1st Qu.:2.000  
##  Mode  :character   K:23                   Median :110.0   Median :3.000  
##                     N: 6                   Mean   :106.9   Mean   :2.545  
##                     P: 9                   3rd Qu.:110.0   3rd Qu.:3.000  
##                     Q: 8                   Max.   :160.0   Max.   :6.000  
##                     R: 8                                                  
##       Fat            Sodium          Fibre        Carbohydrates 
##  Min.   :0.000   Min.   :  0.0   Min.   : 0.000   Min.   :-1.0  
##  1st Qu.:0.000   1st Qu.:130.0   1st Qu.: 1.000   1st Qu.:12.0  
##  Median :1.000   Median :180.0   Median : 2.000   Median :14.0  
##  Mean   :1.013   Mean   :159.7   Mean   : 2.152   Mean   :14.6  
##  3rd Qu.:2.000   3rd Qu.:210.0   3rd Qu.: 3.000   3rd Qu.:17.0  
##  Max.   :5.000   Max.   :320.0   Max.   :14.000   Max.   :23.0  
##                                                                 
##      Sugar          Potassium         Vitamins      Shelf      Weight    
##  Min.   :-1.000   Min.   : -1.00   Min.   :  0.00   1:20   Min.   :0.50  
##  1st Qu.: 3.000   1st Qu.: 40.00   1st Qu.: 25.00   2:21   1st Qu.:1.00  
##  Median : 7.000   Median : 90.00   Median : 25.00   3:36   Median :1.00  
##  Mean   : 6.922   Mean   : 96.08   Mean   : 28.25          Mean   :1.03  
##  3rd Qu.:11.000   3rd Qu.:120.00   3rd Qu.: 25.00          3rd Qu.:1.00  
##  Max.   :15.000   Max.   :330.00   Max.   :100.00          Max.   :1.50  
##                                                                          
##       Cups           Rating      Manufacturer_Name 
##  Min.   :0.250   Min.   :18.04   Length:77         
##  1st Qu.:0.670   1st Qu.:33.17   Class :character  
##  Median :0.750   Median :40.40   Mode  :character  
##  Mean   :0.821   Mean   :42.67                     
##  3rd Qu.:1.000   3rd Qu.:50.83                     
##  Max.   :1.500   Max.   :93.70                     
## 

Carbohydrates, sugars and Potassium have some negative values. Since this is not possible we can replace negative values with NA.

##      Name           Manufacturer   Type       Calories        Protein     
##  Length:77          A: 1         Cold:74   Min.   : 50.0   Min.   :1.000  
##  Class :character   G:22         Hot : 3   1st Qu.:100.0   1st Qu.:2.000  
##  Mode  :character   K:23                   Median :110.0   Median :3.000  
##                     N: 6                   Mean   :106.9   Mean   :2.545  
##                     P: 9                   3rd Qu.:110.0   3rd Qu.:3.000  
##                     Q: 8                   Max.   :160.0   Max.   :6.000  
##                     R: 8                                                  
##       Fat            Sodium          Fibre        Carbohydrates 
##  Min.   :0.000   Min.   :  0.0   Min.   : 0.000   Min.   : 5.0  
##  1st Qu.:0.000   1st Qu.:130.0   1st Qu.: 1.000   1st Qu.:12.0  
##  Median :1.000   Median :180.0   Median : 2.000   Median :14.5  
##  Mean   :1.013   Mean   :159.7   Mean   : 2.152   Mean   :14.8  
##  3rd Qu.:2.000   3rd Qu.:210.0   3rd Qu.: 3.000   3rd Qu.:17.0  
##  Max.   :5.000   Max.   :320.0   Max.   :14.000   Max.   :23.0  
##                                                   NA's   :1     
##      Sugar          Potassium         Vitamins      Shelf      Weight    
##  Min.   : 0.000   Min.   : 15.00   Min.   :  0.00   1:20   Min.   :0.50  
##  1st Qu.: 3.000   1st Qu.: 42.50   1st Qu.: 25.00   2:21   1st Qu.:1.00  
##  Median : 7.000   Median : 90.00   Median : 25.00   3:36   Median :1.00  
##  Mean   : 7.026   Mean   : 98.67   Mean   : 28.25          Mean   :1.03  
##  3rd Qu.:11.000   3rd Qu.:120.00   3rd Qu.: 25.00          3rd Qu.:1.00  
##  Max.   :15.000   Max.   :330.00   Max.   :100.00          Max.   :1.50  
##  NA's   :1        NA's   :2                                              
##       Cups           Rating      Manufacturer_Name 
##  Min.   :0.250   Min.   :18.04   Length:77         
##  1st Qu.:0.670   1st Qu.:33.17   Class :character  
##  Median :0.750   Median :40.40   Mode  :character  
##  Mean   :0.821   Mean   :42.67                     
##  3rd Qu.:1.000   3rd Qu.:50.83                     
##  Max.   :1.500   Max.   :93.70                     
## 

3 Exploratory Data Analysis

3.1 Browsing the Dataset

3.3 Nutritionals

3.3.2 Add nutritionals per ounce and 100g

## Rows: 77
## Columns: 17
## $ Name              <chr> "100% Bran", "100% Natural Bran", "All-Bran", "All-…
## $ Manufacturer      <fct> N, Q, K, K, R, G, K, G, R, P, Q, G, G, G, G, R, K, …
## $ Type              <fct> Cold, Cold, Cold, Cold, Cold, Cold, Cold, Cold, Col…
## $ Calories          <dbl> 70, 120, 70, 50, 110, 110, 110, 130, 90, 90, 120, 1…
## $ Protein           <dbl> 4, 3, 4, 4, 2, 2, 2, 3, 2, 3, 1, 6, 1, 3, 1, 2, 2, …
## $ Fat               <dbl> 1, 5, 1, 0, 2, 2, 0, 2, 1, 0, 2, 2, 3, 2, 1, 0, 0, …
## $ Sodium            <dbl> 130, 15, 260, 140, 200, 180, 125, 210, 200, 210, 22…
## $ Fibre             <dbl> 10.0, 2.0, 9.0, 14.0, 1.0, 1.5, 1.0, 2.0, 4.0, 5.0,…
## $ Carbohydrates     <dbl> 5.0, 8.0, 7.0, 8.0, 14.0, 10.5, 11.0, 18.0, 15.0, 1…
## $ Sugar             <dbl> 6, 8, 5, 0, 8, 10, 14, 8, 6, 5, 12, 1, 9, 7, 13, 3,…
## $ Potassium         <dbl> 280, 135, 320, 330, NA, 70, 30, 100, 125, 190, 35, …
## $ Vitamins          <dbl> 25, 0, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, …
## $ Shelf             <fct> 3, 3, 3, 3, 3, 1, 2, 3, 1, 3, 2, 1, 2, 3, 2, 1, 1, …
## $ Weight            <dbl> 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.33, 1.0…
## $ Cups              <dbl> 0.33, 1.00, 0.33, 0.50, 0.75, 0.75, 1.00, 0.75, 0.6…
## $ Rating            <dbl> 68.40297, 33.98368, 59.42551, 93.70491, 34.38484, 2…
## $ Manufacturer_Name <chr> "Nabisco", "Quaker Oats", "Kellogs", "Kellogs", "Ra…
## Rows: 77
## Columns: 35
## $ Name               <chr> "100% Bran", "100% Natural Bran", "All-Bran", "All…
## $ Manufacturer       <fct> N, Q, K, K, R, G, K, G, R, P, Q, G, G, G, G, R, K,…
## $ Type               <fct> Cold, Cold, Cold, Cold, Cold, Cold, Cold, Cold, Co…
## $ Calories           <dbl> 70, 120, 70, 50, 110, 110, 110, 130, 90, 90, 120, …
## $ Protein            <dbl> 4, 3, 4, 4, 2, 2, 2, 3, 2, 3, 1, 6, 1, 3, 1, 2, 2,…
## $ Fat                <dbl> 1, 5, 1, 0, 2, 2, 0, 2, 1, 0, 2, 2, 3, 2, 1, 0, 0,…
## $ Sodium             <dbl> 130, 15, 260, 140, 200, 180, 125, 210, 200, 210, 2…
## $ Fibre              <dbl> 10.0, 2.0, 9.0, 14.0, 1.0, 1.5, 1.0, 2.0, 4.0, 5.0…
## $ Carbohydrates      <dbl> 5.0, 8.0, 7.0, 8.0, 14.0, 10.5, 11.0, 18.0, 15.0, …
## $ Sugar              <dbl> 6, 8, 5, 0, 8, 10, 14, 8, 6, 5, 12, 1, 9, 7, 13, 3…
## $ Potassium          <dbl> 280, 135, 320, 330, NA, 70, 30, 100, 125, 190, 35,…
## $ Vitamins           <dbl> 25, 0, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25,…
## $ Shelf              <fct> 3, 3, 3, 3, 3, 1, 2, 3, 1, 3, 2, 1, 2, 3, 2, 1, 1,…
## $ Weight             <dbl> 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.33, 1.…
## $ Cups               <dbl> 0.33, 1.00, 0.33, 0.50, 0.75, 0.75, 1.00, 0.75, 0.…
## $ Rating             <dbl> 68.40297, 33.98368, 59.42551, 93.70491, 34.38484, …
## $ Manufacturer_Name  <chr> "Nabisco", "Quaker Oats", "Kellogs", "Kellogs", "R…
## $ Calories_oz        <dbl> 70.0, 120.0, 70.0, 50.0, 110.0, 110.0, 110.0, 172.…
## $ Protein_oz         <dbl> 4.00, 3.00, 4.00, 4.00, 2.00, 2.00, 2.00, 3.99, 2.…
## $ Fat_oz             <dbl> 1.00, 5.00, 1.00, 0.00, 2.00, 2.00, 0.00, 2.66, 1.…
## $ Sodium_oz          <dbl> 130.0, 15.0, 260.0, 140.0, 200.0, 180.0, 125.0, 27…
## $ Fibre_oz           <dbl> 10.00, 2.00, 9.00, 14.00, 1.00, 1.50, 1.00, 2.66, …
## $ Carbohydrates_oz   <dbl> 5.00, 8.00, 7.00, 8.00, 14.00, 10.50, 11.00, 23.94…
## $ Sugar_oz           <dbl> 6.00, 8.00, 5.00, 0.00, 8.00, 10.00, 14.00, 10.64,…
## $ Potassium_oz       <dbl> 280, 135, 320, 330, NA, 70, 30, 133, 125, 190, 35,…
## $ Vitamins_oz        <dbl> 25.00, 0.00, 25.00, 25.00, 25.00, 25.00, 25.00, 33…
## $ Calories_100g      <dbl> 247, 423, 247, 176, 388, 388, 388, 610, 317, 317, …
## $ Protein_100g       <dbl> 14.1, 10.6, 14.1, 14.1, 7.1, 7.1, 7.1, 14.1, 7.1, …
## $ Fat_100g           <dbl> 3.5, 17.6, 3.5, 0.0, 7.1, 7.1, 0.0, 9.4, 3.5, 0.0,…
## $ Sodium_100g        <dbl> 458.6, 52.9, 917.1, 493.8, 705.5, 634.9, 440.9, 98…
## $ Fibre_100g         <dbl> 35.3, 7.1, 31.7, 49.4, 3.5, 5.3, 3.5, 9.4, 14.1, 1…
## $ Carbohydrates_100g <dbl> 17.6, 28.2, 24.7, 28.2, 49.4, 37.0, 38.8, 84.4, 52…
## $ Sugar_100g         <dbl> 21.2, 28.2, 17.6, 0.0, 28.2, 35.3, 49.4, 37.5, 21.…
## $ Potassium_100g     <dbl> 987.7, 476.2, 1128.8, 1164.0, NA, 246.9, 105.8, 46…
## $ Vitamins_100g      <dbl> 88.2, 0.0, 88.2, 88.2, 88.2, 88.2, 88.2, 117.3, 88…
##      Name           Manufacturer   Type       Calories        Protein     
##  Length:77          A: 1         Cold:74   Min.   : 50.0   Min.   :1.000  
##  Class :character   G:22         Hot : 3   1st Qu.:100.0   1st Qu.:2.000  
##  Mode  :character   K:23                   Median :110.0   Median :3.000  
##                     N: 6                   Mean   :106.9   Mean   :2.545  
##                     P: 9                   3rd Qu.:110.0   3rd Qu.:3.000  
##                     Q: 8                   Max.   :160.0   Max.   :6.000  
##                     R: 8                                                  
##       Fat            Sodium          Fibre        Carbohydrates 
##  Min.   :0.000   Min.   :  0.0   Min.   : 0.000   Min.   : 5.0  
##  1st Qu.:0.000   1st Qu.:130.0   1st Qu.: 1.000   1st Qu.:12.0  
##  Median :1.000   Median :180.0   Median : 2.000   Median :14.5  
##  Mean   :1.013   Mean   :159.7   Mean   : 2.152   Mean   :14.8  
##  3rd Qu.:2.000   3rd Qu.:210.0   3rd Qu.: 3.000   3rd Qu.:17.0  
##  Max.   :5.000   Max.   :320.0   Max.   :14.000   Max.   :23.0  
##                                                   NA's   :1     
##      Sugar          Potassium         Vitamins      Shelf      Weight    
##  Min.   : 0.000   Min.   : 15.00   Min.   :  0.00   1:20   Min.   :0.50  
##  1st Qu.: 3.000   1st Qu.: 42.50   1st Qu.: 25.00   2:21   1st Qu.:1.00  
##  Median : 7.000   Median : 90.00   Median : 25.00   3:36   Median :1.00  
##  Mean   : 7.026   Mean   : 98.67   Mean   : 28.25          Mean   :1.03  
##  3rd Qu.:11.000   3rd Qu.:120.00   3rd Qu.: 25.00          3rd Qu.:1.00  
##  Max.   :15.000   Max.   :330.00   Max.   :100.00          Max.   :1.50  
##  NA's   :1        NA's   :2                                              
##       Cups           Rating      Manufacturer_Name   Calories_oz   
##  Min.   :0.250   Min.   :18.04   Length:77          Min.   : 25.0  
##  1st Qu.:0.670   1st Qu.:33.17   Class :character   1st Qu.:100.0  
##  Median :0.750   Median :40.40   Mode  :character   Median :110.0  
##  Mean   :0.821   Mean   :42.67                      Mean   :112.1  
##  3rd Qu.:1.000   3rd Qu.:50.83                      3rd Qu.:110.0  
##  Max.   :1.500   Max.   :93.70                      Max.   :240.0  
##                                                                    
##    Protein_oz        Fat_oz        Sodium_oz        Fibre_oz     
##  Min.   :0.500   Min.   :0.000   Min.   :  0.0   Min.   : 0.000  
##  1st Qu.:2.000   1st Qu.:0.000   1st Qu.:130.0   1st Qu.: 0.500  
##  Median :3.000   Median :1.000   Median :180.0   Median : 2.000  
##  Mean   :2.656   Mean   :1.075   Mean   :168.2   Mean   : 2.303  
##  3rd Qu.:3.750   3rd Qu.:2.000   3rd Qu.:225.0   3rd Qu.: 3.000  
##  Max.   :6.000   Max.   :5.000   Max.   :320.0   Max.   :14.000  
##                                                                  
##  Carbohydrates_oz    Sugar_oz       Potassium_oz    Vitamins_oz    
##  Min.   : 5.00    Min.   : 0.000   Min.   :  7.5   Min.   :  0.00  
##  1st Qu.:12.00    1st Qu.: 3.000   1st Qu.: 40.0   1st Qu.: 25.00  
##  Median :15.00    Median : 7.000   Median : 90.0   Median : 25.00  
##  Mean   :15.33    Mean   : 7.535   Mean   :106.1   Mean   : 30.15  
##  3rd Qu.:18.16    3rd Qu.:11.175   3rd Qu.:129.0   3rd Qu.: 25.00  
##  Max.   :27.93    Max.   :21.000   Max.   :345.8   Max.   :150.00  
##  NA's   :1        NA's   :1        NA's   :2                       
##  Calories_100g    Protein_100g       Fat_100g      Sodium_100g    
##  Min.   : 88.0   Min.   : 1.800   Min.   : 0.00   Min.   :   0.0  
##  1st Qu.:353.0   1st Qu.: 7.100   1st Qu.: 0.00   1st Qu.: 458.6  
##  Median :388.0   Median :10.600   Median : 3.50   Median : 634.9  
##  Mean   :395.3   Mean   : 9.384   Mean   : 3.79   Mean   : 593.5  
##  3rd Qu.:388.0   3rd Qu.:13.200   3rd Qu.: 7.10   3rd Qu.: 793.7  
##  Max.   :847.0   Max.   :21.200   Max.   :17.60   Max.   :1128.8  
##                                                                   
##    Fibre_100g     Carbohydrates_100g   Sugar_100g    Potassium_100g  
##  Min.   : 0.000   Min.   :17.60      Min.   : 0.00   Min.   :  26.5  
##  1st Qu.: 1.800   1st Qu.:42.30      1st Qu.:10.60   1st Qu.: 141.1  
##  Median : 7.100   Median :52.90      Median :24.70   Median : 317.5  
##  Mean   : 8.127   Mean   :54.06      Mean   :26.58   Mean   : 374.3  
##  3rd Qu.:10.600   3rd Qu.:64.05      3rd Qu.:39.42   3rd Qu.: 455.0  
##  Max.   :49.400   Max.   :98.50      Max.   :74.10   Max.   :1219.8  
##                   NA's   :1          NA's   :1       NA's   :2       
##  Vitamins_100g  
##  Min.   :  0.0  
##  1st Qu.: 88.2  
##  Median : 88.2  
##  Mean   :106.3  
##  3rd Qu.: 88.2  
##  Max.   :529.1  
## 

3.3.5 Calories

3.3.5.1 Summary of Calorie Content

There seem to be some mistakes in the dataset regarding calorie content. As there are products that have almost no calories (<90 kcal) and products that have close to the maximum amount of calories possible per 100g of product (900 kcal).

Since we know that fat has 9 kcal/g and protein and carbohydrates have 4 kcal/g we will recalculate the calories from the nutritional data available and replot the histogram of calories per 100g.

## # A tibble: 7 x 6
##   Manufacturer_Name           Average Median Lowest Highest Count
##   <chr>                         <dbl>  <dbl>  <dbl>   <dbl> <int>
## 1 American Home Food Products    314.   314.  314.     314.     1
## 2 General Mills                  299.   288.  215.     479.    22
## 3 Kellogs                        296.   296.  169.     535     23
## 4 Nabisco                        264.   275.  158.     339.     6
## 5 Post                           265.   282.  184.     344.     9
## 6 Quaker Oats                    224.   247.   84.4    314.     8
## 7 Ralston Purina                 324.   326.  272.     377.     8

3.3.5.2 Distribution of Calorie Content

3.3.6 Fat

3.3.6.1 Summary of Fat Content

## # A tibble: 7 x 6
##   Manufacturer_Name           Average Median Lowest Highest Count
##   <chr>                         <dbl>  <dbl>  <dbl>   <dbl> <int>
## 1 American Home Food Products     3.5    3.5    3.5     3.5     1
## 2 General Mills                   5.1    3.5    3.5    10.6    22
## 3 Kellogs                         2.5    0      0      10.6    23
## 4 Nabisco                         0.6    0      0       3.5     6
## 5 Post                            3.5    3.5    0      10.6     9
## 6 Quaker Oats                     6.2    7.1    0      17.6     8
## 7 Ralston Purina                  4.4    3.5    0      10.6     8

3.3.6.2 Distribution of Fat Content

3.3.7 Protein

3.3.7.1 Summary of Protein Content

## # A tibble: 7 x 6
##   Manufacturer_Name           Average Median Lowest Highest Count
##   <chr>                         <dbl>  <dbl>  <dbl>   <dbl> <int>
## 1 American Home Food Products    14.1   14.1   14.1    14.1     1
## 2 General Mills                   8.7    7.1    3.5    21.2    22
## 3 Kellogs                        10.2   10.6    3.5    21.2    23
## 4 Nabisco                         9.8   10.6    5.9    14.1     6
## 5 Post                            9.3   10.6    3.5    14.1     9
## 6 Quaker Oats                     8.6    7      1.8    17.6     8
## 7 Ralston Purina                  8.8    7.1    3.5    14.1     8

3.3.7.2 Distribution of Protein Content

3.3.8 Carbohydrates

3.3.8.1 Summary of Carbohydrates Content

## # A tibble: 7 x 6
##   Manufacturer_Name           Average Median Lowest Highest Count
##   <chr>                         <dbl>  <dbl>  <dbl>   <dbl> <int>
## 1 American Home Food Products    56.4   56.4   56.4    56.4     1
## 2 General Mills                  54.6   52.9   37      84.4    22
## 3 Kellogs                        58.1   56.4   24.7    98.5    23
## 4 Nabisco                        54.8   60     17.6    74.1     6
## 5 Post                           49.3   49.4   38.8    60       9
## 6 Quaker Oats                    35     42.3   17.6    49.4     8
## 7 Ralston Purina                 62.2   58.2   49.4    81.1     8

3.3.8.2 Distribution of Carbohydrates Content

3.3.9 Sugar

3.3.9.1 Summary of Sugar Content

## # A tibble: 7 x 6
##   Manufacturer_Name           Average Median Lowest Highest Count
##   <chr>                         <dbl>  <dbl>  <dbl>   <dbl> <int>
## 1 American Home Food Products    10.6   10.6   10.6    10.6     1
## 2 General Mills                  30     31.7    3.5    74.1    22
## 3 Kellogs                        29.7   24.7    0      68.8    23
## 4 Nabisco                         6.5    0      0      21.2     6
## 5 Post                           33.7   38.8   10.6    65.7     9
## 6 Quaker Oats                    21.7   21.2    0      42.3     8
## 7 Ralston Purina                 21.6   19.4    7.1    38.8     8

3.3.9.2 Distribution of Sugar Content

3.3.10 Fibre

3.3.10.1 Summary of Fibre Content

## # A tibble: 7 x 6
##   Manufacturer_Name           Average Median Lowest Highest Count
##   <chr>                         <dbl>  <dbl>  <dbl>   <dbl> <int>
## 1 American Home Food Products     0      0      0       0       1
## 2 General Mills                   5      5.3    0      21.2    22
## 3 Kellogs                        10.6    3.5    0      49.4    23
## 4 Nabisco                        13.8   10.6    3.5    35.3     6
## 5 Post                           11.1   10.6    0      28.1     9
## 6 Quaker Oats                     4.5    5.3    0       9.5     8
## 7 Ralston Purina                  6.6    7      0      14.1     8

3.3.10.2 Distribution of Fibre Content

3.3.11 Sodium

3.3.11.1 Summary of Sodium Content

## # A tibble: 7 x 6
##   Manufacturer_Name           Average Median Lowest Highest Count
##   <chr>                         <dbl>  <dbl>  <dbl>   <dbl> <int>
## 1 American Home Food Products      0     0       0       0      1
## 2 General Mills                  740.  706.    494.   1023.    22
## 3 Kellogs                        670.  706.      0    1129.    23
## 4 Nabisco                        132.   26.4     0     459.     6
## 5 Post                           557.  600.    159.    938.     9
## 6 Quaker Oats                    326.  265.      0     776      8
## 7 Ralston Purina                 699.  706.    335.    988.     8

3.3.11.2 Distribution of Sodium Content

3.3.12 Potassium

3.3.12.1 Summary of Potassium Content

## # A tibble: 7 x 6
##   Manufacturer_Name           Average Median Lowest Highest Count
##   <chr>                         <dbl>  <dbl>  <dbl>   <dbl> <int>
## 1 American Home Food Products    335.   335.  335.     335.     1
## 2 General Mills                  329.   282.   88.2   1217     22
## 3 Kellogs                        408.   212.   70.5   1164     23
## 4 Nabisco                        500.   423.  278.     988.     6
## 5 Post                           455    318.   88.2   1220.     9
## 6 Quaker Oats                    248    247.   26.5    476.     8
## 7 Ralston Purina                 360.   406.   88.2    600.     8

3.3.12.2 Distribution of Potassium Content

4 Unsupervised Learning

4.1 Principal Component Analysis (PCA)

## Importance of components:
##                           PC1    PC2    PC3     PC4     PC5     PC6     PC7
## Standard deviation     1.7585 1.5749 1.2921 0.94776 0.70543 0.55139 0.21930
## Proportion of Variance 0.3436 0.2756 0.1855 0.09981 0.05529 0.03378 0.00534
## Cumulative Proportion  0.3436 0.6192 0.8047 0.90448 0.95977 0.99356 0.99890
##                            PC8       PC9
## Standard deviation     0.09954 1.821e-16
## Proportion of Variance 0.00110 0.000e+00
## Cumulative Proportion  1.00000 1.000e+00

## Warning: package 'ggpubr' was built under R version 3.6.2

4.2 K-means Clustering

We need to select the number of clusters that has the optimal value of within sum of squares error (WSS).

Once we have a scree plot, we will select the number of clusters where the WSS improves more slowly as the number of clusters increases.

From the scree plot above, 3 or 4 seem to be the optimal number of clusters.

## K-means clustering with 3 clusters of sizes 15, 48, 11
## 
## Cluster means:
##   Calories   Protein      Fat   Sodium     Fibre Carbohydrates    Sugar
## 1 242.8400  8.286667 3.053333  95.2400  6.593333      45.55333 20.22000
## 2 291.5437  8.514583 3.777083 710.0917  4.752083      55.87292 25.71667
## 3 322.9182 14.027273 4.590909 836.0273 25.663636      56.37273 41.31818
##   Potassium   Rating
## 1  298.9667 52.42421
## 2  259.5208 37.82648
## 3  976.6545 48.49800
## 
## Clustering vector:
##  [1] 3 1 3 3 2 2 2 2 3 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 1 3 3 2 1 2 2 2 1 2 2 2 2 2
## [39] 2 2 2 1 1 2 3 2 2 3 2 2 3 2 1 1 2 3 2 1 2 2 1 1 1 1 2 1 2 3 2 2 2 2 2 2
## 
## Within cluster sum of squares by cluster:
## [1]  764161.6 2725525.3 1130535.5
##  (between_SS / total_SS =  67.9 %)
## 
## Available components:
## 
## [1] "cluster"      "centers"      "totss"        "withinss"     "tot.withinss"
## [6] "betweenss"    "size"         "iter"         "ifault"
## K-means clustering with 3 clusters of sizes 27, 11, 36
## 
## Cluster means:
##   Calories   Protein      Fat   Sodium     Fibre Carbohydrates    Sugar
## 1 250.9519  8.788889 4.048148 269.7704  6.607407      44.84074 25.73704
## 2 322.9182 14.027273 4.590909 836.0273 25.663636      56.37273 41.31818
## 3 301.6944  8.213889 3.272222 784.1444  4.127778      59.84722 23.41111
##   Potassium   Rating
## 1  309.1481 46.12256
## 2  976.6545 48.49800
## 3  238.7361 37.68681
## 
## Clustering vector:
##  [1] 2 1 2 2 3 1 3 3 2 3 3 3 1 3 3 3 1 3 1 3 1 3 1 3 1 2 2 1 1 3 1 3 1 3 3 3 3 3
## [39] 3 1 3 1 1 1 2 3 3 2 3 3 2 3 1 1 1 2 1 1 3 3 1 1 1 1 3 1 3 2 3 3 1 3 3 3
## 
## Within cluster sum of squares by cluster:
## [1] 2226440 1130535 1429583
##  (between_SS / total_SS =  66.8 %)
## 
## Available components:
## 
##  [1] "cluster"      "centers"      "totss"        "withinss"     "tot.withinss"
##  [6] "betweenss"    "size"         "iter"         "ifault"       "silinfo"     
## [11] "nbclust"      "data"